Fast deterministic detector sampling#78
Conversation
…ng for components determined by single f-variables.
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
|
…faelha/fast_deterministic_detector_sampling
…faelha/fast_deterministic_detector_sampling
…faelha/fast_deterministic_detector_sampling
|
Hi! Could you add a bit more context to this issue? A description of the expected behavior, use case, or any relevant details would help us prioritize and implement it. Thanks! |
6197df5 to
400766b
Compare
…faelha/fast_deterministic_detector_sampling
☂️ Code Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
- Introduced a zero-copy fast path in `_CompiledSamplerBase` to optimize sampling when direct f-indices are contiguous, without flips or reindexing. - Updated `compile_program` to consolidate direct entries into a single list, enhancing clarity and efficiency in handling direct components. - Simplified the logic for sorting direct entries and adjusted the output order to align with the original layout, reducing potential reindexing at sample time. - Enhanced the `transform_error_basis` function to streamline the detection of phase variables, improving overall graph processing efficiency.
- Added a new test for the detector sampler that verifies it returns empty arrays when no detectors are present, both with and without a reference sample. - Removed outdated test cases that were previously causing crashes due to empty concatenation.
|
@codex review again |
There was a problem hiding this comment.
Pull request overview
This PR adds a fast path for detector sampling when connected components are deterministically given by a single error-basis variable (f_i), avoiding the JAX compilation + autoregressive sampling pipeline for those components. This targets faster detector sampling on low-noise surface-code-like circuits (per the linked paper).
Changes:
- Detect “direct” connected components (single output equal to one
fvariable, optionally flipped) and represent them explicitly inCompiledProgram. - Add a pure-NumPy sampling fast path for programs consisting only of direct components, plus precomputed output reindexing to avoid per-sample
argsort. - Add a unit test for reference sampling with zero detectors, a benchmark-style test, and documentation/changelog updates.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/tsim/core/graph.py |
Adds direct-component classification and prioritizes output-adjacent parameter vertices during error-basis transform. |
src/tsim/compile/pipeline.py |
Splits compilation into direct vs compiled components; precomputes output permutation (output_reindex). |
src/tsim/core/types.py |
Extends CompiledProgram to carry direct_f_indices, direct_flips, and output_reindex. |
src/tsim/sampler.py |
Uses direct bits in sample_program; adds NumPy-only _sample_direct fast path; updates probability_of. |
src/tsim/circuit.py |
Documents the new detector-sampler fast path behavior. |
test/unit/test_sampler.py |
Adds regression test for use_detector_reference_sample=True with no detectors. |
test/integration/test_sampler.py |
Updates CompiledProgram construction to include new required fields. |
test/unit/benchmarks/test_classical_detector_sampling.py |
Adds a performance threshold test (currently problematic for CI). |
CHANGELOG.md |
Documents the new detector-sampler fast path under Unreleased. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Code reviewFound 1 issue:
tsim/test/unit/benchmarks/test_classical_detector_sampling.py Lines 8 to 36 in 746b41f 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
PR QuEraComputing#78 moved the matmul_gf2 + prod-over-T calls into per-family modules in terms.py, which dropped the dispatch sites the prior optimization commit relied on. This commit re-plumbs them. terms.py - Per-module CSR fields (NodePhases.params_csr, HalfPiPhases.params_csr, PiProducts.psi_csr/phi_csr, PhasePairs.alpha_csr/beta_csr), static so jit caches by instance identity (D, TSIM_GF2MM_BACKEND=cust_spdn / cust). - Lazy prod-over-T helpers lifted from the pre-refactor evaluate.py: _lazy_node_phases_prod / _lazy_phase_pairs_prod (B, TSIM_TERM_VALS_LAZY=1), and split-coeffs variants (B + SPLIT_COEFFS=1). - Cust prod-over-T kernels: _cust_node_phases_prod / _cust_phase_pairs_prod (C-prod, TSIM_TERM_VALS_CUST_KERNEL=1). - Complex-backend prod-over-T helpers with unroll=True scan and optional c128 cust kernel (H/H2, TSIM_SCALAR_BACKEND=complex128). - Dispatch funnels (_node_phases_prod_dispatch / _phase_pairs_prod_dispatch) resolve flag precedence: complex → cust → split → lazy → original path. - PiProducts.evaluate: cast sum_exponents to int32 before the `1 - 2*x` step. Without the cast, uint promotion wraps (255 in uint8, 2^64-1 in uint64 under JAX_ENABLE_X64=1) and `signed * _IDENTITY` carries garbage. compile.py - Each _compile_* helper calls build_params_csr on its (G, T, P) bitmask and forwards the result. build_params_csr returns None when cust_jax isn't available or G*T < TSIM_GF2MM_MIN_GT, so non-cust builds are unaffected. evaluate.py - make_summands replaces ExactScalarArray for static_phases / float_factor so the choice of scalar backend is centralized. - Complex case folds 2^power2 at to_complex() time (ComplexScalarArray has no power axis to absorb the factor). exact_scalar.py - _USE_COMPLEX128 + make_summands exposed for terms.py and evaluate.py. linalg.py - matmul_gf2 takes optional csr arg. When the csr is non-None and a cust backend is selected, dispatches to matmul_gf2_csr_ffi (cust) or matmul_gf2_csr_spdn_ffi (cust_spdn). Falls back to float32 matmul otherwise. Bit-exact in both paths. Tests: - Default (no flags): 777/777 pass. - B (TSIM_TERM_VALS_LAZY=1): 777/777 pass. - B + SPLIT_COEFFS=1: 777/777 pass. - D fallback (no cust_jax): 777/777 pass. - H (JAX_ENABLE_X64=1 + TSIM_SCALAR_BACKEND=complex128): 776/777 pass — test_seed asserts an exact-count sample histogram which differs by 2 under c128 vs ESA arithmetic (also fails on plain upstream + x64; not introduced here). C-prod / H2 cust kernels require cust_jax and a GPU; the dispatch is in place but not exercised in this env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR contains the Clifford detector sampling fast path used in https://arxiv.org/pdf/2604.01059
Instead of compiling graphs into parametric expressions, one can read of the Tanner graph directly for deterministic detectors.
This significantly speeds up Clifford-only circuits (non-Clifford circuits are only slightly sped up). For example, for the circuits from the Clifft paper I got the following updated numbers:
Note that r=1 coherent noise circuits from Clifft become Clifford-only circuits after ZX reduction in the Tsim compilation step.